Re: Surrogate Keys: an Implementation Issue
- From: "Brian Selzer" <brian@xxxxxxxxxxxxxxxxxxx>
- Date: Tue, 25 Jul 2006 11:58:25 GMT
"Bob Badour" <bbadour@xxxxxxxxxxxxxxxx> wrote in message
news:X1gxg.15744$pu3.340040@xxxxxxxxxxxxxxxxxxxxxxxxxx
Brian Selzer wrote:
"Bob Badour" <bbadour@xxxxxxxxxxxxxxxx> wrote in message
news:XQ5xg.14298$pu3.333248@xxxxxxxxxxxxxxxxxxxxxxxxxx
Brian Selzer wrote:
"Bernard Peek" <bap@xxxxxxxxxx> wrote in message
news:xn0ep1e3sn11oo000@xxxxxxxxxxxxxxxxxxxxxx
In comp.databases.theory Paul Mansour wrote:
The second is perhaps more profound. Consider a rollback database, or
a database that must provide a complete audit trail of every change.
For example, the database must provide the answer to "who changed
this SS number from X to Y, and when did they change it? As far as I
can tell, if there is no way to answer this without an immutable
identifier. ( I suppose you could design the DB to handle specific
cases, but I'm interested in DBMS with native rollback and audit
trail support.)
If what you think of as the key is mutable it's not a natural key.
Natural keys aren't mutable, at all, ever.
Untrue! Natural keys are often mutable--especially compound keys.
The problem here is that there isn't a usable natural key for
identifying people. So all that's left are surrogates of varying
quality. The SSN is a surrogate that works most of the time. If you
assign someone a payroll number and tell tham that thay won't get paid
unless they can quote it then you have a close to immutable key. But
it's still a surrogate.
I think the original post was part correct. Use a natural key when you
have one. I'd just add to that, don't use a surrogate unless there is a
compelling reason for it. Efficiency is unlikely to be a compelling
argument in most cases.
There is always a compelling reason to use surrogates: natural keys can
change. This makes it difficult--if not impossible--to detect changes
to rows. For example, Bob is preparing to update a row--that is, he has
read the row and is in the process of keying in a change. During that
time, another process updated several rows in the same table. Unless
the key is immutable, when Bob issues the update, there's no guarantee
that the row he's updating is the same one that was read out.
So? Bob issues an update statement to change some attribute identified by
a logical identifier. As long as the data Bob is entering is correct, why
should he care what happens to any other attributes?
He may be overwriting a change made by another user.
The only relevant concern is whether the update is correct. If the update
is correct, it matters not at all that it overwrites something, which is
presumably no longer correct.
This problem is magnified
if there are rows related via a foreign key constraint because it's
possible for the referenced row to appear unchanged. So you're left
with either maintaining an exclusive lock on the row until Bob returns
from the golf outing, or adding additional columns and code in order to
determine with certainty whether or not a change occurred between the
time that a row was read and the time of the update.
Or you can just write better applications that don't update anything that
didn't change. By introducing some 'under-the-covers' identifying
attribute, you create a risk that Bob will change some data identified by
a familiar logical identifier and some other process in the meantime will
associate that identifier with a different surrogate. Your application
will then record the updates against the wrong logical identifier.
You're missing the point. The change Bob's making may change the row,
but because several changes occurred to the table while Bob was keying in
his change, the row he's about to change may represent some other entity
altogether.
Using logical identity and natural keys, that's not possible. The key that
Bob specifies identifies the entity. And if Bob updates the entire row,
the entire row needs updating. I draw your attention, in particular, to my
observation that one can just write better applications that don't update
anything that didn't change.
Of course it's possible. The key that Bob specifies identifies the entity
at the time of the read from the database. By the time that the write
occurs, that same key now refers to a different entity. The logical
identity in the database value that existed at the time of the read is
different from the logical identity in the database value that exists at the
time of the write even though their atomic values are identical. Logical
identity within the value of a database at a specific point in time can be
represented by a natural key. In order to span multiple database values
with an arbitrary number of intermediate values, there must exist on each
relation a key whose value is guaranteed not to change during the time that
Bob is keying in his change--even if he decides to go on vacation. Since a
natural key value can change a number of times in the interval between the
database value that existed at the time of the read and the database value
at the time of the write and since in a concurrent environment these changes
may have originated by a number of users, the natural key value at the time
of the read may not necessarily refer to the same entity at the time of the
write. The logical identity of an entity represented by a key value is
relative to the contents of the relation value to which the entity belongs.
The introduction of an undefined number of intermediate relation values (up
to one for each intermediate database value) between the time that a row is
read and the time that it is written makes it impossible to determine
whether or not the key value about to be written refers to the same entity
for which the key was read.
The only time Bob's change could update the wrong entity is if the
application Bob is using identifies the 'row' using a surrogate key that
Bob cannot see.
I don't think you understand the problem. See above.
The Relational Model doesn't take into account duration
You have yet to establish that it needs to.
, so the value of a
database at the time of a read is not necessarily the same as that at the
time of a write
This only matters if one writes shitty applications with sloppy code. See
my earlier posts.
No. The integrity of the database should not depend on the quality of
application code. The database should reject changes that would cause it to
become corrupt, so shitty applications and sloppy code are not relevant to
this discussion.
, nor is the meaning of a key with a specific atomic value at
the time of a read necessarily the same as the meaning of a key with the
same atomic value at the time of a write.
Your statement is only possible if the value of the key is hidden from the
user. Thus, you invented the problem by introducing a surrogate.
Otherwise, values do not change their meanings. They have the same meaning
for all time.
The problem only exists when there is no surrogate, so your statement makes
no sense.
Atomic values don't change their meaning. But an attribute value can.
Surrogates provide the continuity
across database states that the Relational Model lacks.
I disagree. Hidden surrogates re-introduce all the problems of
location-based updates. The use of logical identity through exposed values
obviates your entire argument. Apparently, you lack the ability to
recognize that fact.
I don't think that surrogates should be hidden--at least not from
applications. They should be hidden from end-users, however.
Logical identity does not necessarily span database values. Apparently, you
lack the ability to recognize that fact.
It seems to me you just went to a lot of trouble to invent a problem that
never existed in the first place.
I didn't invent this problem.
I disagree. Had you not introduced a hidden surrogate, the problem would
never have existed.
Duh! The problem stems from the lack of a surrogate and the inherent
volatility of natural keys.
I've run across it several times. It's
extremely difficult to diagnose and even more difficult to repair, since
it lets garbage get into the database.
Shitty applications with sloppy code force garbage into the database. Stop
writing them.
I didn't write them: I fixed them. Had I written them, they would have used
surrogates and thus would not have exhibited this behavior. But it has
noting to do with how poor an application is. Integrity should be handled
by the database, not the application.
Natural keys are necessary to maintain the integrity of the information
stored in the database; surrogates are necessary to maintain the
integrity of information during the time that it is in use by
applications.
I disagree entirely.
.
- Follow-Ups:
- Re: Surrogate Keys: an Implementation Issue
- From: Bob Badour
- Re: Surrogate Keys: an Implementation Issue
- References:
- Surrogate Keys: an Implementation Issue
- From: Paul Mansour
- Re: Surrogate Keys: an Implementation Issue
- From: Roy Hann
- Re: Surrogate Keys: an Implementation Issue
- From: Paul Mansour
- Re: Surrogate Keys: an Implementation Issue
- From: Bernard Peek
- Re: Surrogate Keys: an Implementation Issue
- From: Brian Selzer
- Re: Surrogate Keys: an Implementation Issue
- From: Bob Badour
- Re: Surrogate Keys: an Implementation Issue
- From: Brian Selzer
- Re: Surrogate Keys: an Implementation Issue
- From: Bob Badour
- Surrogate Keys: an Implementation Issue
- Prev by Date: Re: Surrogate Keys: an Implementation Issue
- Next by Date: Re: Surrogate Keys: an Implementation Issue
- Previous by thread: Re: Surrogate Keys: an Implementation Issue
- Next by thread: Re: Surrogate Keys: an Implementation Issue
- Index(es):