Re: Surrogate Keys: an Implementation Issue



"Paul Mansour" <paul@xxxxxxxxxxxxxxxxx> wrote in message
news:1153323860.787392.82180@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I'm working on a little RDBMs project, and I've been pondering the
general arguments for and against surrogate keys, and have come to
following conclusion:

The primary key should indeed be a natural key, if one is available, at
the model level, but at the implementation level the table should have
a system-supplied, auto-incremented, and hidden, "super primary
key". This super primary key, (SPK) is then used when populating FKs
in other tables (again, hidden from the user). It is up to the DBMS to
do the necessary conversions to insulate the user from knowledge of the
SPK.

If there is no natural key, then the SPK can simply be exposed to the
user as the PK for that table.

Thus, from the user's perspective, it appears that, say, Social
Security Number is sprinkled across a dozen different tables, when
under the covers its just meaningless 32 bit ints, and the SS number is
only stored once in the Employee table.

This seems to give all of the advantages of surrogate keys, (very easy
to change, saves on space) and the advantages of natural keys
(meaningful, etc), and conversely, none of the drawbacks of either. In
other words, the best of both worlds.

Is this a bad conclusion?

Yes. It creates a comforting delusion about what you know about the real
world, as well as creating lots of extra work, and concealing correct
information behind spurious indirection.

The example of SSN numbers constantly comes up in defense of using surrogate
keys, because there are problems with them. However, the fact is that a
database represents the state of the knowledge the business possesses about
its enterprise of interest. The business certainly must know about the
limitations of various natural keys because they are down there on the shop
floor dealing with the reality every day. If they are presented with a
number that purports to be a relevant piece of information (the SSN say)
they are going to write it down--precisely because it is relevant. But they
also know it is in some sense provisional. All they expect is to be able to
correct it in the database so that the database continues to represent their
knowledge of the state of the enterprise of interest--*even when they know
their knowledge may still be wrong*.

The surrogate saves an inconsequential amount of work--namely propagating
the change of value everywhere. But it also costs a lot of coding, testing,
development discipline, and obscurity. Good design says we keep one fact in
one place. If we propagate something to many places then it is only because
it is a foreign key. If it is a foreign key then there should be a foreign
key constraint. And if there is an FK constraint it can be declared ON
UPDATE CASCADE (or whatever the equivalent incantation is in your DDL of
choice). There, three words, job done. Forever.

I do sometimes concede that a surrogate is required, but it is only ever my
last resort, never my first, and I would certainly disagree passionately if
someone made the sweeping recommendation that my "table should have a
system-supplied, auto-incremented, and hidden, 'super primary key'." One
might as well use pointers otherwise--they'd do the same job, and probably a
heck of a lot faster.

There is a reason why we abandoned that approach 35 years ago, and I am old
enough to remember it. :-)

Roy


.



Relevant Pages

  • Little design mistakes that can be easily avoided (2): Listenning to CELKO (and CELKO alikes)
    ... I opened a thread that gave opportunity for CELKO to post the ... What is reality of data? ... subset of attribute would be a natural key*. ... <<4) A surrogate key is system generated to replace the actual key ...
    (comp.databases.theory)
  • Re: A real world example
    ... I disagree that the concept of surrogate vs. natural is useful. ... By this I am unclear what a natural key is a surrogate for. ... It is an arbitrary identifier chosen by my parents. ... to have no control. ...
    (comp.databases.theory)
  • Re: A real world example
    ... I disagree that the concept of surrogate vs. natural is useful. ... By this I am unclear what a natural key is a surrogate for. ... It is an arbitrary identifier chosen by my parents. ... The arguments against natural keys relate mostly to control. ...
    (comp.databases.theory)
  • Re: How should I generate a primary key?
    ... And I have agreed that there are cases where a pseudokey is useful. ... the business case that it should be VERY HARD to change ... change the value of the primary key which is more problematic in the field ... The generated key is used as a primary key, but the natural key still sits ...
    (comp.databases)
  • Re: A real world example
    ... I disagree that the concept of surrogate vs. natural is useful. ... Think of any natural key. ... It is an arbitrary identifier chosen by the IRS to identify tax filings related to my income. ... When any other organization chooses to use those identifiers, they choose to have no control. ...
    (comp.databases.theory)