Re: Surrogate Keys: an Implementation Issue



I prefer to think of a surrogate as more of a temporal attribute. While its
value identifies a proposition in the database, it also indicates that a new
proposition became known by the database, allowing it to become part of the
discussion. What do I mean by that? Surrogates make it possible to
determine whether totally new information is being added to the body of
knowledge or that existing information has had to be altered to reflect
changes in the circumstances underpinning that information. The Relational
Model is insufficient in that regard because it does not require that
primary keys be immutable.

I think that surrogates should be hidden from end-users, but that is more of
a best practice than a requirement and should be imposed upon and enforced
by applications and database administrators. I disagree with the notion
that surrogates should "leak" into the real world. They should be
accessible to applications in order to prevent the update problems I've
described in an earlier post, but their values should never appear on a data
entry screen or document. There are many reasons for this, here are a
couple:

(1) Anything that an end-user can see is subject to a change order. For
example, a customer may object to being referred to by the number "666."
(2) It may be necessary to take a database offline and change the physical
representation or values for surrogates, such as is the case when databases
are merged. If a user has written down the key value, he'll complain that
he can't find the information anymore.

There should be a second candidate key on every relation. The difference
between OIDs and surrogates is that OIDs are pointers that reference an
instance of an object while a surrogate indicates that a statement of fact
became known. This means that in an object store there can be object
instances that are identical with exception of the object identifier. While
it may be interesting to be able to count the number of identical object
instances, the same proposition can't have become known more than once
within the same database state. Relations are sets of propositions, so it
doesn't make any sense to allow duplicates, since something is either known
to be true or not. Therefore, I disagree with the idea of using a surrogate
without a second candidate key, even if that key includes every other
attribute in the relation. A surrogate is not a surrogate without a second
candidate key.

"JOG" <jog@xxxxxxxxxxxxx> wrote in message
news:1154262656.521112.118530@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
This discussion has illuminated me, and I would like to share that. A
row is of course merely a proposition, and is identified by the
predicate fields that make each proposition in that set unique. A
surrogate key similar helps us identify a particular row, so there
initially seems no problem there, and its not an addressing issue.
Rather the issue is that the surrogate _does not exist_ in the real
world, it is initially an artifice of the database, and is not part of
the original proposition.

So whats it doing there? Consider two people with the same names:

Ex ( Name(x, "Sarah Jones") & Marital_status(x, married) ) &
Ey ( Name(y, "Sarah Jones") & Marital_status(y, married) ) &
x != y

We don't have sufficient information to distinguish x and y (note this
specifically refers to the propositions, not the conceptual entities
involved). Perhaps this distinguishing information is impossible to
record, or simply cost ineffective to do so. Anyhow, If we had a
mechanism that seperated propositions x and y (an OID for example),
then how on earth would we know which one to update when the time came?
(Obviously we couldn't, and thats part of the problem with OID's.)

But are surrogates in RM any different? My initial opinion was that the
answer was NO, that they seem to be no different to OID's, except in
their encoding. However gradually my opinion has changed. The reason
for this is that while an OID is always physical, a surrogate key leaks
into the real world almost as soon as it is generated. While it is
initially an artifice, it quickly becomes a real attribute, and that
validates it's use as part of the logical model. Hence the rows become:

Ex ( EmpID(x, 1) & Name(x, "Sarah Jones") & status(x, married) ) &
Ey ( EmpID(x, 2) & Name(y, "Sarah Jones") & status(y, married) )

Noone can tell you what their OID in some database is, but they may
well be able to tell you what their employee/payroll number is.

It hence seems to me that it is the leak in to the real world that is
absolutely _vital_ to validate surrogate use.



.



Relevant Pages

  • Re: Surrogate Keys: an Implementation Issue
    ... value identifies a proposition in the database, it also indicates that a new ... One doesn't need a surrogate for that. ... instances that are identical with exception of the object identifier. ... for this is that while an OID is always physical, ...
    (comp.databases.theory)
  • Re: what are keys and surrogates?
    ... identifier, i.e. even the database queries only use them in equalities. ... "hiddenness" of attribute values, a principle that in many theoretical ... what it means for a surrogate to be hidden: Brian is quite close in his ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, ... by a logical identifier. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, there's no guarantee that the row he's updating is the same one that was read out. ... Bob issues an update statement to change some attribute identified by a logical identifier. ...
    (comp.databases.theory)
  • Re: what are keys and surrogates?
    ... An 'Identifier' is a set of attributes, ... i.e. even the database queries only use them in equalities. ... "hiddenness" of attribute values, a principle that in many theoretical ... what it means for a surrogate to be hidden: Brian is quite close in his ...
    (comp.databases.theory)