Re: Surrogate Keys: an Implementation Issue




"Bob Badour" <bbadour@xxxxxxxxxxxxxxxx> wrote in message
news:XQ5xg.14298$pu3.333248@xxxxxxxxxxxxxxxxxxxxxxxxxx
Brian Selzer wrote:

"Bernard Peek" <bap@xxxxxxxxxx> wrote in message
news:xn0ep1e3sn11oo000@xxxxxxxxxxxxxxxxxxxxxx

In comp.databases.theory Paul Mansour wrote:

The second is perhaps more profound. Consider a rollback database, or
a database that must provide a complete audit trail of every change.
For example, the database must provide the answer to "who changed
this SS number from X to Y, and when did they change it? As far as I
can tell, if there is no way to answer this without an immutable
identifier. ( I suppose you could design the DB to handle specific
cases, but I'm interested in DBMS with native rollback and audit
trail support.)

If what you think of as the key is mutable it's not a natural key.
Natural keys aren't mutable, at all, ever.

Untrue! Natural keys are often mutable--especially compound keys.


The problem here is that there isn't a usable natural key for
identifying people. So all that's left are surrogates of varying
quality. The SSN is a surrogate that works most of the time. If you
assign someone a payroll number and tell tham that thay won't get paid
unless they can quote it then you have a close to immutable key. But
it's still a surrogate.

I think the original post was part correct. Use a natural key when you
have one. I'd just add to that, don't use a surrogate unless there is a
compelling reason for it. Efficiency is unlikely to be a compelling
argument in most cases.

There is always a compelling reason to use surrogates: natural keys can
change. This makes it difficult--if not impossible--to detect changes to
rows. For example, Bob is preparing to update a row--that is, he has
read the row and is in the process of keying in a change. During that
time, another process updated several rows in the same table. Unless the
key is immutable, when Bob issues the update, there's no guarantee that
the row he's updating is the same one that was read out.

So? Bob issues an update statement to change some attribute identified by
a logical identifier. As long as the data Bob is entering is correct, why
should he care what happens to any other attributes?


He may be overwriting a change made by another user.


This problem is magnified
if there are rows related via a foreign key constraint because it's
possible for the referenced row to appear unchanged. So you're left with
either maintaining an exclusive lock on the row until Bob returns from
the golf outing, or adding additional columns and code in order to
determine with certainty whether or not a change occurred between the
time that a row was read and the time of the update.

Or you can just write better applications that don't update anything that
didn't change. By introducing some 'under-the-covers' identifying
attribute, you create a risk that Bob will change some data identified by
a familiar logical identifier and some other process in the meantime will
associate that identifier with a different surrogate. Your application
will then record the updates against the wrong logical identifier.


You're missing the point. The change Bob's making may change the row, but
because several changes occurred to the table while Bob was keying in his
change, the row he's about to change may represent some other entity
altogether.

The Relational Model doesn't take into account duration, so the value of a
database at the time of a read is not necessarily the same as that at the
time of a write, nor is the meaning of a key with a specific atomic value at
the time of a read necessarily the same as the meaning of a key with the
same atomic value at the time of a write. Surrogates provide the continuity
across database states that the Relational Model lacks.

It seems to me you just went to a lot of trouble to invent a problem that
never existed in the first place.


I didn't invent this problem. I've run across it several times. It's
extremely difficult to diagnose and even more difficult to repair, since it
lets garbage get into the database.


Natural keys are necessary to maintain the integrity of the information
stored in the database; surrogates are necessary to maintain the
integrity of information during the time that it is in use by
applications.

I disagree entirely.


.



Relevant Pages

  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... Unless the key is immutable, when Bob issues the update, there's no guarantee that the row he's updating is the same one that was read out. ... Bob issues an update statement to change some attribute identified by a logical identifier. ... Surrogates ain't gonna fix that. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, ... by a logical identifier. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every ... when Bob issues ... identified by a logical identifier. ... Do Jane Jones and Jane Smith represent different people? ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... There is always a compelling reason to use surrogates: ... For example, Bob is preparing to update a row--that is, he has ... a logical identifier. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, there's no guarantee that the row he's updating is the same one that was read out. ... Bob issues an update statement to change some attribute identified by a logical identifier. ...
    (comp.databases.theory)

Quantcast