Re: Surrogate Keys: an Implementation Issue




"paul c" <toledobythesea@xxxxxxxx> wrote in message
news:lWXxg.243067$iF6.132287@xxxxxxxxxxx
Brian Selzer wrote:
"Bob Badour" <bbadour@xxxxxxxxxxxxxxxx> wrote in message
news:PPMxg.21411$pu3.353655@xxxxxxxxxxxxxxxxxxxxxxxxxx
Brian Selzer wrote:

"Bob Badour" <bbadour@xxxxxxxxxxxxxxxx> wrote in message
news:X1gxg.15744$pu3.340040@xxxxxxxxxxxxxxxxxxxxxxxxxx

Brian Selzer wrote:


"Bob Badour" <bbadour@xxxxxxxxxxxxxxxx> wrote in message
news:XQ5xg.14298$pu3.333248@xxxxxxxxxxxxxxxxxxxxxxxxxx


Brian Selzer wrote:


"Bernard Peek" <bap@xxxxxxxxxx> wrote in message
news:xn0ep1e3sn11oo000@xxxxxxxxxxxxxxxxxxxxxx


In comp.databases.theory Paul Mansour wrote:


The second is perhaps more profound. Consider a rollback
database, or
a database that must provide a complete audit trail of every
change.
For example, the database must provide the answer to "who changed
this SS number from X to Y, and when did they change it? As far
as I
can tell, if there is no way to answer this without an immutable
identifier. ( I suppose you could design the DB to handle
specific
cases, but I'm interested in DBMS with native rollback and audit
trail support.)
If what you think of as the key is mutable it's not a natural key.
Natural keys aren't mutable, at all, ever.
Untrue! Natural keys are often mutable--especially compound keys.


The problem here is that there isn't a usable natural key for
identifying people. So all that's left are surrogates of varying
quality. The SSN is a surrogate that works most of the time. If
you
assign someone a payroll number and tell tham that thay won't get
paid
unless they can quote it then you have a close to immutable key.
But
it's still a surrogate.

I think the original post was part correct. Use a natural key when
you
have one. I'd just add to that, don't use a surrogate unless there
is a
compelling reason for it. Efficiency is unlikely to be a
compelling
argument in most cases.
There is always a compelling reason to use surrogates: natural keys
can change. This makes it difficult--if not impossible--to detect
changes to rows. For example, Bob is preparing to update a
row--that is, he has read the row and is in the process of keying
in a change. During that time, another process updated several rows
in the same table. Unless the key is immutable, when Bob issues
the update, there's no guarantee that the row he's updating is the
same one that was read out.
So? Bob issues an update statement to change some attribute
identified by a logical identifier. As long as the data Bob is
entering is correct, why should he care what happens to any other
attributes?
He may be overwriting a change made by another user.
The only relevant concern is whether the update is correct. If the
update is correct, it matters not at all that it overwrites something,
which is presumably no longer correct.



This problem is magnified


if there are rows related via a foreign key constraint because it's
possible for the referenced row to appear unchanged. So you're
left with either maintaining an exclusive lock on the row until Bob
returns
from the golf outing, or adding additional columns and code in
order to
determine with certainty whether or not a change occurred between
the time that a row was read and the time of the update.
Or you can just write better applications that don't update anything
that didn't change. By introducing some 'under-the-covers'
identifying attribute, you create a risk that Bob will change some
data identified by a familiar logical identifier and some other
process in the meantime will associate that identifier with a
different surrogate. Your application will then record the updates
against the wrong logical identifier.
You're missing the point. The change Bob's making may change the
row, but because several changes occurred to the table while Bob was
keying in his change, the row he's about to change may represent some
other entity altogether.
Using logical identity and natural keys, that's not possible. The key
that Bob specifies identifies the entity. And if Bob updates the
entire row, the entire row needs updating. I draw your attention, in
particular, to my observation that one can just write better
applications that don't update anything that didn't change.
Of course it's possible. The key that Bob specifies identifies the
entity at the time of the read from the database. By the time that the
write occurs, that same key now refers to a different entity.
Are you stupid or just ignorant? What you state above can only happen if
one assumes one is using a hidden surrogate. If one uses explicit values
that Bob interacts with, then the value identifies a single entity. That
identity does not change.


From Bob's perspective, what he would expect is that the entity that he's
working on, identified by the key values he used to retrieve it, would
not change; however, since the database is no longer the same when Bob
attempts to commit, the entity that he was working on may either no
longer exist, or may have been changed by another user. During that same
interval, a second entity may have been changed so that by the time Bob
attempts to commit, that second entity may be identified with respect to
the new database state by the exact same key values that Bob read out.
...

I think this is misreading the information principle and perhaps it is due
to falling into the trap of thinking typical lock manager behaviour has
something to do with RT. Even if it's not that, it's a mistake. It
doesn't matter if the rows involved (not necessarily the ones updated)
were deleted twenty times and re-inserted. If the invoking transaction
were required to re-iterate the 'rows' it read whose values it considered
to be crucial for a correct update, then there would less chance of
falling into this trap.


The problem is with the theory, not the mechanisms used to get around its
limitations. As I stated before, the Relational Model doesn't take into
account duration. What do I mean by that? The Relational Model takes into
consideration only two successive database states: the current state, which
is always consistent, and the proposed state, which may or may not be
consistent. These states are separated by a single operation, either a
relational assignment or a multiple assignment. If the proposed state
violates the database predicate, then it is rejected, otherwise, upon
completion of the operation, the proposed state becomes the current state.
The operation occurs instantaneously (at least from a theoretical
standpont), meaning that it has no duration; however, there are no
simultaneous operations, and order is important. (In a concurrent
environment, simultaneous assignments are aggregated into a single multiple
assignment.) Relational assignment is set-based--the new relation value
replaces the old relation value; multiple assignment is a set of relational
assignments and is also set-based--the set of new relation values replaces
the set of old relation values. There are no row-based operations. This
poses several problems. I'll use contrived examples for brevity to
illustrate these problems. To be useful, a database is a representation of
some aspect of reality. Each table is a container for a specific type of
entity, and each row represents a real instance of that type of entity.
Consider the following states for a table that contains people:

Current: Jane Jones Married Proposed: Jane Smith Divorced

Do Jane Jones and Jane Smith represent different people? Or did Jane Jones
get Divorced? Because the only key is mutable, there isn't enough
information to answer that. Another problem involves temporal constraints.
Consider the following states for the people table and a transition
constraint, Single people can't become Divorced:

Current: Jane Jones Married Proposed: Jane Jones Married
Current: Jane Smith Single Proposed: Jane Smith Divorced

Should the proposed state be rejected? Or is it possible that Jane Jones
got Divorced becoming Jane Smith and Jane Smith married Bob Jones? Because
the only key is mutable and because there or no row-based operations, there
isn't enough information to answer that. The lack of row-based operations
means that if all keys are mutable, it is not possible to enforce transition
constraints because there is no way to correlate the rows in the Current
state with those in the Proposed state. Another problem involves the
situation I described above. Add the following states for a children table:

Read: Jane Jones Brian Write: Jane Jones Beth
Read: Jane Jones Lynn Write: Jane Smith Brian
Write: Jane Smith Lynn

When Bob reads Jane Jones' row from the people table, Jane Jones has two
children, Brian and Lynn, but when Bob is ready to update the people table,
Jane Jones only has one child, Beth. Clearly these are two different
people, even though the row in the people table is identical. Because the
only key is mutable, Bob will be updating the wrong row.

It should be obvious that surrogates solve these problems.

p


.



Relevant Pages

  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, ... by a logical identifier. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... There is always a compelling reason to use surrogates: ... For example, Bob is preparing to update a row--that is, he has ... a logical identifier. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, there's no guarantee that the row he's updating is the same one that was read out. ... Bob issues an update statement to change some attribute identified by a logical identifier. ...
    (comp.databases.theory)
  • Re: Surrogate Keys: an Implementation Issue
    ... a database that must provide a complete audit trail of every change. ... The SSN is a surrogate that works most of the time. ... Unless the key is immutable, when Bob issues the update, there's no guarantee that the row he's updating is the same one that was read out. ... By introducing some 'under-the-covers' identifying attribute, you create a risk that Bob will change some data identified by a familiar logical identifier and some other process in the meantime will associate that identifier with a different surrogate. ...
    (comp.databases.theory)
  • Re: bound object frame: picture
    ... Please let me respond to your critique and hopefully help Bob along the way. ... generates can be very large when they contain images. ... Print event NOT THE FORMAT EVENT. ... your entire database will stop working when you get about 100 pictures ...
    (microsoft.public.access.forms)