Re: candidate keys in abstract parent relations



vldm10 wrote:
> David Portas wrote:
>
> > Yes it's a bad idea. Here's a generic example. Consider two tuples:
> >
> > a tuple = {'A','B','C'}
> > a tuple = {'A','B','C'}
> >
> > They are the same. It serves no purpose to represent the same
> > information twice, which is why we can safely say that a relvar must
> > have at least one candidate key - the generality of the model isn't
>
> Here we have the following problem: Is it always possible to represent
> the entity's ( or relationship's) instance in a RM? If you know
> for some theory which support these ideas (even if it is a
> philosophical theory) let us know.
> I don't think that regarding the real world situation "we can
> safely say that relvar must have at least one candidate key" as you
> stated. Regarding some abstract mathematical relation, which doesn't
> represent reality, your sentence is true.

It's a mistake to believe that the "real world" and the world of "pure
theory" are somehow incompatible; a theory that provides no help when
confronting practical problems isn't a very useful theory. The
relational model has no problem a priori with surrogate keys, it only
states that a relation cannot contain duplicate values. I may be
mistaken, but the phrase "surrogate key" is an ex post facto addition
to relational database theory and isn't part of the underlying
relational model at all.

A common rule within the database design community is to abjure
surrogate keys, although a lot of totally handwavy rationales are
provided for discouraging them (Celko's explanation that they "expose
the physical layout" of the table is egregious in this regard: any
table with auto-generated sequence identifiers and the ability to
delete rows is going to develop holes in the sequence over time which
aren't reflected in the physical representation of the database, and
I'd be willing to bet at least a few DBMS implementations allow their
instances to move rows around in the physical files with little respect
for the order in which they were initially added). The real problem
with surrogate keys is how frequently they're used as a remedy for a
poorly-considered schema design, due to their automatic and innate
guarantee of row uniqueness (presuming the generator functions aren't
totally broken, which isn't always a safe assumption).

So the "real world" isn't really the problem here: the real-world
solution to the problem of ensuring uniqueness is to use a surrogate
key. But by the same token, the word "surrogate" is important. It can
have (at least) two meanings in this case: a surrogate key can act as a
proxy for another, natural candidate key on the relation, for whatever
reason (surrogate keys make foreign key relationships very
straightforward to express, which is an important consideration in
managing the complexity of highly normalized schemas), and a surrogate
key can also act as a stand-in for an ephemeral or hard-to-capture
notion of uniqueness in the world of this particular database's design.

In both cases (especially the latter), the overall integrity of the
database is incumbent more on the designer's vision and design skills
than depending upon formal methods to rescue designers from their own
bad decisions; this is one of the risks of allowing the use of
surrogate keys in designs. But there are lots of ways to screw up
relational designs, and surrogate keys suffer more than their fair
share of abuse and mystification precisely because of their popularity
and the lazy thinking their use can betray.

After thinking this all through, the methodology I've derived is pretty
simple: I'm going to continue associating surrogate keys with the
tables that aren't defined purely in terms of foreign keys (such as the
tables used in many-to-many associations), but in each case I'm going
to ensure that I've identified (and created a corresponding uniqueness
constraint to enforce) at least one natural key for each table. This is
a practical accomodation of a theoretical constraint: being obligated
to identify a candidate key that emerges from attributes associated
with my entities keeps my tables and design relational, and has the
side benefit of making me confront what defines identity for my
entities, which can result in clearer and more logical designs. At the
same time, my DBMS of choice penalizes me for using non-integer indexes
as primary keys, and I can't imagine the situation improves when
considering the use of primary keys defined over multiple attributes.

> Example:
> In the case that the entity's (or relationship's) instance repeats
> the same values or in the case when we have the two entity's
> (relationship's) instances with the equal values and the relation
> which has "natural key", then it is not possible to represent these
> entities' (relationships') instances in the relation. But, if the
> key is "surrogate key", then we can represent mentioned instances
> in the relation of the RM.
>
> In the case that your database is in a production and that you have
> "natural key" which can't solve above mantioned problems, you can
> consider to redesign your application (this can be very complex and
> with millions of data). This redesign can have the limitations what is
> another long story.

As long as the users of the database can be confident that integrity is
being maintained, it doesn't make sense to go to huge efforts to remove
surrogate keys in the name of design purity. However, if poorly-chosen
surrogate keys / badly-designed tables allow applications to erode
relational integrity, you gotta bite the bullet and fix the problem,
one way or another.

> Let me give you one example from the business environment but from a
> different point of view.
> A Honda dealer has received 100 new 2006 Honda civic cars and these
> 100 cars have the list with the 50 same attributes. You should design
> the Database for these entities using "natural key" and without
> using Vehicle Identification Number (VIN) or any other "surrogate
> key" or "artificial key".

This betrays a misunderstanding (perhaps not yours, I need to add) of
what Codd (and Date, and others) mean by a "surrogate key" -- while a
VIN is clearly a generated, synthetic key, it's not a surrogate,
because it's handed to you by a source external to your database's
universe; surrogates are created and managed internal to the DBMS. The
reason I make this distinction is because I think VINs make an
excellent candidate key for cars: they're guaranteed to be present,
they're guaranteed to be unique, and they're guaranteed to last as long
as the the relevant entity is in existence (a car with its VIN filed
off is very unlikely to remain anything recognizable as the same car,
as anyone who's had their car stolen and chopped learns). Just because
they're incomprehensible gibberish handed to you by an external agency
doesn't make them surrogate keys. This is why Codd's stricture that
surrogate keys not be exposed outside the database is important: a
surrogate key handed out to the world at large ceases to be a surrogate
key, and becomes an artificial identifier that's fair game for use by
parties over which the originating database has little influence or
control. If it meets a set of criteria (some of which I mentioned
above), it's even a *good* choice for a unique identifier, and
therefore a key.

> And you will see that this is
> impossible, because all "fields" have the same corresponding
> values. The point is not that you can't work here with "natural
> key", the point is that you can't even represent these entities in
> the RM if you are using "natural key". Same thing if you are
> selling the Microsoft Windows on CD or the desktop computers...

I think this is really a consequence of having an overly restrictive
definition of what can be included in a natural key. However, one of
the consequences of using generated identifiers is that they end up
being the sole component of the key, due to the principle of necessary
irreducibility. Which is why there's typically something weird / bad
going on if a surrogate identifier is used as an element in a
multi-attribute candidate key.

Forrest L Norvell

.



Relevant Pages

  • Re: candidate keys in abstract parent relations
    ... > But don't forget there is another phase after the logical model and thats ... > This is where surrogate keys really come into play. ... > interface into the applications using the database. ... we have to make so many design compromises. ...
    (comp.databases.theory)
  • Re: Newbie question
    ... the relationship as represented in the database ("current" ... >>> a surrogate key that does not point to the correct object. ... There is an update anomaly when a record is updated using a natural key ... I think we agree that both natural and surrogate keys are ...
    (comp.databases.theory)
  • Re: Natural keys vs Aritficial Keys
    ... of query when surrogate keys were used -- but this was ignored. ... there are cases where the use of surrogate keys ... actually amount to a sizeable amount of "dead" bytes in the database. ... the design center of database design has shifted from the support of an ...
    (comp.databases.theory)
  • Re: What does everyone else do for graphically displaying data?
    ... The matter of keys is complicated by a tendency for some to overuse integer surrogate keys, without identifying the so-called "natural" key that at least ideally identifies the row based on the attributes represented by the columns. ... This surrogate key should be managed by the database - your object should use it, if at all, only as a very low-level handle for a database row, not for the object' own purposes. ... The equals definition for a Point class would compare tuples where a Polyline class could contain a Collection of Points and have an equals definition that compares the contained Points. ...
    (comp.lang.java.databases)
  • Re: Surrogate Keys: an Implementation Issue
    ... The primary key should indeed be a natural key, if one is available, at ... This super primary key, (SPK) is then used when populating FKs ... This seems to give all of the advantages of surrogate keys, ...
    (comp.databases.theory)