Re: Surrogate Keys: an Implementation Issue




"JOG" <jog@xxxxxxxxxxxxx> wrote in message
news:1154375986.924741.168930@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Brian Selzer wrote:
I prefer to think of a surrogate as more of a temporal attribute. While
its
value identifies a proposition in the database, it also indicates that a
new
proposition became known by the database, allowing it to become part of
the
discussion.

A logfile will indicate when a relvar has a different relation value in
it. One doesn't need a surrogate for that.

What do I mean by that? Surrogates make it possible to
determine whether totally new information is being added to the body of
knowledge or that existing information has had to be altered to reflect
changes in the circumstances underpinning that information.

Totally new information? All updates add totally new information. It's
a totally new relation each time you make an update. You are confusing
RT with the entity/relationship model. The RT is not concerned with
entities but statements of fact. I recommend having another look at the
ORM as it helped me shift out of that mindset.


Yes, totally new. I perceive a significant difference between a proposition
that reflects new circumstances and a proposition that reflects changes in
existing circumstances. For example, if a new employee, John Smith, is
hired, then information about him is totally new and now an additional tuple
must exist to reflect that information; on the other hand, if employee Joyce
Miller gets married and becomes Joyce Harper, then information about her
isn't totally new--much of it was already known by the database prior to the
update, and now a replacement tuple must exist to reflect that information.
In practice, it is often necessary to define extra constraints on added
information. For example, several tasks may need to be performed by
Personnel for a new employee, requiring that the several attributes in the
Employee relation that denote whether or not those tasks have been completed
be set a certain way for a new employee. If all updates add totally new
information, then RT is not sufficient to meet that need, and we must rely
on implementation-specific extensions to define such constraints.

The Relational Model is insufficient in that regard because it does not
require that
primary keys be immutable.

I think that surrogates should be hidden from end-users, but that is more
of
a best practice than a requirement and should be imposed upon and
enforced
by applications and database administrators. I disagree with the notion
that surrogates should "leak" into the real world. They should be
accessible to applications in order to prevent the update problems I've
described in an earlier post, but their values should never appear on a
data
entry screen or document. There are many reasons for this, here are a
couple:

(1) Anything that an end-user can see is subject to a change order. For
example, a customer may object to being referred to by the number "666."
(2) It may be necessary to take a database offline and change the
physical
representation or values for surrogates, such as is the case when
databases
are merged. If a user has written down the key value, he'll complain
that
he can't find the information anymore.

If you hide it, it is a row address. A pointer, if one at the logical
level.


No, it's not.


There should be a second candidate key on every relation. The difference
between OIDs and surrogates is that OIDs are pointers that reference an
instance of an object while a surrogate indicates that a statement of
fact
became known.

OID's reference an instance of a class, not an object,

True.

... but you are
correct that they are pointers. Yet if you hide a surrogate, then there
is little difference, it becomes a pointer to a proposition - the field
itself is no part of that real world proposition, it merely addresses
it. Again I honestly believe it is thinking in terms of entities that
generates the confusion.


Again, the database shouldn't hide them, the applications and DBA should.

There is a significant difference. Yes, a surrogate identifies a
proposition, just like any other candidate key, so in one sense it "points"
to a proposition in the same sense that any candidate key "points" to a
proposition, but the difference manifests itself at the theoretical level,
rather than the implementational level. At the theoretical level,
surrogates enable the propositions in one database state to be correlated
with the propositions in the next, making it possible to differentiate
between added, replaced and retired information during an update, permitting
targeted constraints to be defined.

This means that in an object store there can be object
instances that are identical with exception of the object identifier.

And with a hidden surrogate one can record identical real world
propositions, with the exception of that row identifier. Its the same
mistake imo.


No, one can't. A surrogate is not a surrogate without a second candidate
key; therefore, it is not possible to record identical real world
propositions with a surrogate.

While it may be interesting to be able to count the number of identical
object
instances, the same proposition can't have become known more than once
within the same database state. Relations are sets of propositions, so
it
doesn't make any sense to allow duplicates, since something is either
known
to be true or not.

Neither does it make sense to permit duplicate statements of fact, by
using a hidden surrogate.


A surrogate does not allow duplicate statements of fact. See my previous
comment.

Therefore, I disagree with the idea of using a surrogate
without a second candidate key, even if that key includes every other
attribute in the relation. A surrogate is not a surrogate without a
second
candidate key.

"JOG" <jog@xxxxxxxxxxxxx> wrote in message
news:1154262656.521112.118530@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
This discussion has illuminated me, and I would like to share that. A
row is of course merely a proposition, and is identified by the
predicate fields that make each proposition in that set unique. A
surrogate key similar helps us identify a particular row, so there
initially seems no problem there, and its not an addressing issue.
Rather the issue is that the surrogate _does not exist_ in the real
world, it is initially an artifice of the database, and is not part of
the original proposition.

So whats it doing there? Consider two people with the same names:

Ex ( Name(x, "Sarah Jones") & Marital_status(x, married) ) &
Ey ( Name(y, "Sarah Jones") & Marital_status(y, married) ) &
x != y

We don't have sufficient information to distinguish x and y (note this
specifically refers to the propositions, not the conceptual entities
involved). Perhaps this distinguishing information is impossible to
record, or simply cost ineffective to do so. Anyhow, If we had a
mechanism that seperated propositions x and y (an OID for example),
then how on earth would we know which one to update when the time came?
(Obviously we couldn't, and thats part of the problem with OID's.)

But are surrogates in RM any different? My initial opinion was that the
answer was NO, that they seem to be no different to OID's, except in
their encoding. However gradually my opinion has changed. The reason
for this is that while an OID is always physical, a surrogate key leaks
into the real world almost as soon as it is generated. While it is
initially an artifice, it quickly becomes a real attribute, and that
validates it's use as part of the logical model. Hence the rows become:

Ex ( EmpID(x, 1) & Name(x, "Sarah Jones") & status(x, married) ) &
Ey ( EmpID(x, 2) & Name(y, "Sarah Jones") & status(y, married) )

Noone can tell you what their OID in some database is, but they may
well be able to tell you what their employee/payroll number is.

It hence seems to me that it is the leak in to the real world that is
absolutely _vital_ to validate surrogate use.




.



Relevant Pages

  • Re: what are keys and surrogates?
    ... identifier, i.e. even the database queries only use them in equalities. ... "hiddenness" of attribute values, a principle that in many theoretical ... what it means for a surrogate to be hidden: Brian is quite close in his ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, ... by a logical identifier. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, there's no guarantee that the row he's updating is the same one that was read out. ... Bob issues an update statement to change some attribute identified by a logical identifier. ...
    (comp.databases.theory)
  • Re: what are keys and surrogates?
    ... An 'Identifier' is a set of attributes, ... i.e. even the database queries only use them in equalities. ... "hiddenness" of attribute values, a principle that in many theoretical ... what it means for a surrogate to be hidden: Brian is quite close in his ...
    (comp.databases.theory)
  • Re: How should I generate a primary key?
    ... the external reality and verify them. ... be verified for syntax or check digits inside itself. ... A surrogate key is system generated to replace the actual key behind ... with a quote from Dr. Codd: "..Database users ...
    (comp.databases)