Re: Surrogate Keys: an Implementation Issue
- From: "Brian Selzer" <brian@xxxxxxxxxxxxxxxxxxx>
- Date: Sun, 30 Jul 2006 16:13:12 GMT
I prefer to think of a surrogate as more of a temporal attribute. While its
value identifies a proposition in the database, it also indicates that a new
proposition became known by the database, allowing it to become part of the
discussion. What do I mean by that? Surrogates make it possible to
determine whether totally new information is being added to the body of
knowledge or that existing information has had to be altered to reflect
changes in the circumstances underpinning that information. The Relational
Model is insufficient in that regard because it does not require that
primary keys be immutable.
I think that surrogates should be hidden from end-users, but that is more of
a best practice than a requirement and should be imposed upon and enforced
by applications and database administrators. I disagree with the notion
that surrogates should "leak" into the real world. They should be
accessible to applications in order to prevent the update problems I've
described in an earlier post, but their values should never appear on a data
entry screen or document. There are many reasons for this, here are a
couple:
(1) Anything that an end-user can see is subject to a change order. For
example, a customer may object to being referred to by the number "666."
(2) It may be necessary to take a database offline and change the physical
representation or values for surrogates, such as is the case when databases
are merged. If a user has written down the key value, he'll complain that
he can't find the information anymore.
There should be a second candidate key on every relation. The difference
between OIDs and surrogates is that OIDs are pointers that reference an
instance of an object while a surrogate indicates that a statement of fact
became known. This means that in an object store there can be object
instances that are identical with exception of the object identifier. While
it may be interesting to be able to count the number of identical object
instances, the same proposition can't have become known more than once
within the same database state. Relations are sets of propositions, so it
doesn't make any sense to allow duplicates, since something is either known
to be true or not. Therefore, I disagree with the idea of using a surrogate
without a second candidate key, even if that key includes every other
attribute in the relation. A surrogate is not a surrogate without a second
candidate key.
"JOG" <jog@xxxxxxxxxxxxx> wrote in message
news:1154262656.521112.118530@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
This discussion has illuminated me, and I would like to share that. A
row is of course merely a proposition, and is identified by the
predicate fields that make each proposition in that set unique. A
surrogate key similar helps us identify a particular row, so there
initially seems no problem there, and its not an addressing issue.
Rather the issue is that the surrogate _does not exist_ in the real
world, it is initially an artifice of the database, and is not part of
the original proposition.
So whats it doing there? Consider two people with the same names:
Ex ( Name(x, "Sarah Jones") & Marital_status(x, married) ) &
Ey ( Name(y, "Sarah Jones") & Marital_status(y, married) ) &
x != y
We don't have sufficient information to distinguish x and y (note this
specifically refers to the propositions, not the conceptual entities
involved). Perhaps this distinguishing information is impossible to
record, or simply cost ineffective to do so. Anyhow, If we had a
mechanism that seperated propositions x and y (an OID for example),
then how on earth would we know which one to update when the time came?
(Obviously we couldn't, and thats part of the problem with OID's.)
But are surrogates in RM any different? My initial opinion was that the
answer was NO, that they seem to be no different to OID's, except in
their encoding. However gradually my opinion has changed. The reason
for this is that while an OID is always physical, a surrogate key leaks
into the real world almost as soon as it is generated. While it is
initially an artifice, it quickly becomes a real attribute, and that
validates it's use as part of the logical model. Hence the rows become:
Ex ( EmpID(x, 1) & Name(x, "Sarah Jones") & status(x, married) ) &
Ey ( EmpID(x, 2) & Name(y, "Sarah Jones") & status(y, married) )
Noone can tell you what their OID in some database is, but they may
well be able to tell you what their employee/payroll number is.
It hence seems to me that it is the leak in to the real world that is
absolutely _vital_ to validate surrogate use.
.
- Follow-Ups:
- References:
- Surrogate Keys: an Implementation Issue
- From: Paul Mansour
- Re: Surrogate Keys: an Implementation Issue
- From: Roy Hann
- Re: Surrogate Keys: an Implementation Issue
- From: Paul Mansour
- Re: Surrogate Keys: an Implementation Issue
- From: Bernard Peek
- Re: Surrogate Keys: an Implementation Issue
- From: Brian Selzer
- Re: Surrogate Keys: an Implementation Issue
- From: Bob Badour
- Re: Surrogate Keys: an Implementation Issue
- From: Brian Selzer
- Re: Surrogate Keys: an Implementation Issue
- From: Bob Badour
- Re: Surrogate Keys: an Implementation Issue
- From: Brian Selzer
- Re: Surrogate Keys: an Implementation Issue
- From: Bob Badour
- Re: Surrogate Keys: an Implementation Issue
- From: Brian Selzer
- Re: Surrogate Keys: an Implementation Issue
- From: paul c
- Re: Surrogate Keys: an Implementation Issue
- From: Anith Sen
- Re: Surrogate Keys: an Implementation Issue
- From: David Cressey
- Re: Surrogate Keys: an Implementation Issue
- From: JOG
- Surrogate Keys: an Implementation Issue
- Prev by Date: Re: Surrogate Keys: an Implementation Issue
- Next by Date: Re: Surrogate Keys: an Implementation Issue
- Previous by thread: Re: Surrogate Keys: an Implementation Issue
- Next by thread: Re: Surrogate Keys: an Implementation Issue
- Index(es):
Relevant Pages
|