Re: Newbie question about db normalization theory: redundant keys OK?




"David Portas" <REMOVE_BEFORE_REPLYING_dportas@xxxxxxx> wrote in message
news:HZednQ8PPZsCBP7anZ2dnUVZ8qClnZ2d@xxxxxxxxxxxxxxx
"Brian Selzer" <brian@xxxxxxxxxxxxxxxxxxx> wrote in message
news:prJ8j.53582$eY.44749@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

You are saying that this is a distributed database but that it lacks a
mechanism for accurately propagating changes out to all its nodes?
Well in my view such a DBMS would be broken. It surely violates Codd's
principle of "Distribution Independence". Let's follow your example to
its conclusion though. The solution is to replace whatever copy of the
Blah relation exists in the application with the new Blah relation
that superceded it. Now all emails reach the correct address and there
is no problem that requires a different key.


Forgive me for butting in, David, but where did you come up with the idea
that it is a distributed database? What have disconnected
applications--that is, applications that use something akin to
disconnected ADO recordsets or ADO.NET datasets--to do with distributed
databases?

The question is: for how long is the data that was just read out of the
database considered to be valid? Until the next update? Or is it stale
as soon as its read? Does it have something to do with transaction
control or locking? If several updates occur between the reading of one
piece of information and the reading of another, how can you be sure that
any answer that involves both pieces of information is correct? How can
you be sure that you haven't read the same information twice?

I agree that these are important issues of application design. I don't
think they need to affect the database logical design in this case.


I think they do. For a single state, first order logic suffices; once more
than one state is involved, at a minimum some form of modal logic is
necessary.

If you use an artificial key and a timestamp (or rowversion), then there
can be no doubt as to whether or not the information in question changed
between the first reading and the second.


A ROWVERSION (as defined by Microsoft SQL Server) does not tell you
whether any data has changed. It tells you whether some rows were possibly
affected by update operations. In other words it can give false
positives - indicating a change where there is none.


You're right, of course: it only tells whether some rows were the target of
an update, not whether the information is actually different.

BTW I seriously doubt whether it would be possible or desirable to
implement anything like a ROWVERSION in a true RDBMS. The consequences of
SQL Server's implementation are serious because it attempts to identify
row data based on something other than keys. I have never been a fan of
the ROWVERSION feature.


In what way does it attempt to identify row data based on something other
than keys?

The only way to tell whether the current state of the database equals some
previous state is to query again and compare that to a result previously
retrieved. That comparison is usually based on a key value. It makes no
difference what type of value is used for the key. The comparison is
exactly the same whether it is an "artificial" key or otherwise. (I'm not
too concerned about defining what an "artificial" key is because I don't
think it matters).


I disagree. Unless the identifier is a rigid designator or a rigid definite
description, then any comparison based upon that identifier is suspect. It
may be the case that the key values are identical, but the individuals in
each state that are identified by that key value are different individuals.
For example, "the first person in line" can be different people at different
times.

Consider a set of transformations:

T1:a -> T2:b -> T3:c -> T4:a

Where "T1:a" means "The value of the database at time T1 is a". When we
requery the database at time T4 all we will ever know is that the value is
the same as at time T1. If it is a requirement to know about the previous
updates at T2 and T3 then obviously we ought to preserve that information
in the database - but in many cases it is quite reasonable not to do that.

--
David Portas




.