Re: candidate keys in abstract parent relations
- From: "Forrest L Norvell" <spankysyourpal@xxxxxxxxx>
- Date: 22 Jan 2006 15:20:08 -0800
vldm10 wrote:
> David Portas wrote:
>
> > Yes it's a bad idea. Here's a generic example. Consider two tuples:
> >
> > a tuple = {'A','B','C'}
> > a tuple = {'A','B','C'}
> >
> > They are the same. It serves no purpose to represent the same
> > information twice, which is why we can safely say that a relvar must
> > have at least one candidate key - the generality of the model isn't
>
> Here we have the following problem: Is it always possible to represent
> the entity's ( or relationship's) instance in a RM? If you know
> for some theory which support these ideas (even if it is a
> philosophical theory) let us know.
> I don't think that regarding the real world situation "we can
> safely say that relvar must have at least one candidate key" as you
> stated. Regarding some abstract mathematical relation, which doesn't
> represent reality, your sentence is true.
It's a mistake to believe that the "real world" and the world of "pure
theory" are somehow incompatible; a theory that provides no help when
confronting practical problems isn't a very useful theory. The
relational model has no problem a priori with surrogate keys, it only
states that a relation cannot contain duplicate values. I may be
mistaken, but the phrase "surrogate key" is an ex post facto addition
to relational database theory and isn't part of the underlying
relational model at all.
A common rule within the database design community is to abjure
surrogate keys, although a lot of totally handwavy rationales are
provided for discouraging them (Celko's explanation that they "expose
the physical layout" of the table is egregious in this regard: any
table with auto-generated sequence identifiers and the ability to
delete rows is going to develop holes in the sequence over time which
aren't reflected in the physical representation of the database, and
I'd be willing to bet at least a few DBMS implementations allow their
instances to move rows around in the physical files with little respect
for the order in which they were initially added). The real problem
with surrogate keys is how frequently they're used as a remedy for a
poorly-considered schema design, due to their automatic and innate
guarantee of row uniqueness (presuming the generator functions aren't
totally broken, which isn't always a safe assumption).
So the "real world" isn't really the problem here: the real-world
solution to the problem of ensuring uniqueness is to use a surrogate
key. But by the same token, the word "surrogate" is important. It can
have (at least) two meanings in this case: a surrogate key can act as a
proxy for another, natural candidate key on the relation, for whatever
reason (surrogate keys make foreign key relationships very
straightforward to express, which is an important consideration in
managing the complexity of highly normalized schemas), and a surrogate
key can also act as a stand-in for an ephemeral or hard-to-capture
notion of uniqueness in the world of this particular database's design.
In both cases (especially the latter), the overall integrity of the
database is incumbent more on the designer's vision and design skills
than depending upon formal methods to rescue designers from their own
bad decisions; this is one of the risks of allowing the use of
surrogate keys in designs. But there are lots of ways to screw up
relational designs, and surrogate keys suffer more than their fair
share of abuse and mystification precisely because of their popularity
and the lazy thinking their use can betray.
After thinking this all through, the methodology I've derived is pretty
simple: I'm going to continue associating surrogate keys with the
tables that aren't defined purely in terms of foreign keys (such as the
tables used in many-to-many associations), but in each case I'm going
to ensure that I've identified (and created a corresponding uniqueness
constraint to enforce) at least one natural key for each table. This is
a practical accomodation of a theoretical constraint: being obligated
to identify a candidate key that emerges from attributes associated
with my entities keeps my tables and design relational, and has the
side benefit of making me confront what defines identity for my
entities, which can result in clearer and more logical designs. At the
same time, my DBMS of choice penalizes me for using non-integer indexes
as primary keys, and I can't imagine the situation improves when
considering the use of primary keys defined over multiple attributes.
> Example:
> In the case that the entity's (or relationship's) instance repeats
> the same values or in the case when we have the two entity's
> (relationship's) instances with the equal values and the relation
> which has "natural key", then it is not possible to represent these
> entities' (relationships') instances in the relation. But, if the
> key is "surrogate key", then we can represent mentioned instances
> in the relation of the RM.
>
> In the case that your database is in a production and that you have
> "natural key" which can't solve above mantioned problems, you can
> consider to redesign your application (this can be very complex and
> with millions of data). This redesign can have the limitations what is
> another long story.
As long as the users of the database can be confident that integrity is
being maintained, it doesn't make sense to go to huge efforts to remove
surrogate keys in the name of design purity. However, if poorly-chosen
surrogate keys / badly-designed tables allow applications to erode
relational integrity, you gotta bite the bullet and fix the problem,
one way or another.
> Let me give you one example from the business environment but from a
> different point of view.
> A Honda dealer has received 100 new 2006 Honda civic cars and these
> 100 cars have the list with the 50 same attributes. You should design
> the Database for these entities using "natural key" and without
> using Vehicle Identification Number (VIN) or any other "surrogate
> key" or "artificial key".
This betrays a misunderstanding (perhaps not yours, I need to add) of
what Codd (and Date, and others) mean by a "surrogate key" -- while a
VIN is clearly a generated, synthetic key, it's not a surrogate,
because it's handed to you by a source external to your database's
universe; surrogates are created and managed internal to the DBMS. The
reason I make this distinction is because I think VINs make an
excellent candidate key for cars: they're guaranteed to be present,
they're guaranteed to be unique, and they're guaranteed to last as long
as the the relevant entity is in existence (a car with its VIN filed
off is very unlikely to remain anything recognizable as the same car,
as anyone who's had their car stolen and chopped learns). Just because
they're incomprehensible gibberish handed to you by an external agency
doesn't make them surrogate keys. This is why Codd's stricture that
surrogate keys not be exposed outside the database is important: a
surrogate key handed out to the world at large ceases to be a surrogate
key, and becomes an artificial identifier that's fair game for use by
parties over which the originating database has little influence or
control. If it meets a set of criteria (some of which I mentioned
above), it's even a *good* choice for a unique identifier, and
therefore a key.
> And you will see that this is
> impossible, because all "fields" have the same corresponding
> values. The point is not that you can't work here with "natural
> key", the point is that you can't even represent these entities in
> the RM if you are using "natural key". Same thing if you are
> selling the Microsoft Windows on CD or the desktop computers...
I think this is really a consequence of having an overly restrictive
definition of what can be included in a natural key. However, one of
the consequences of using generated identifiers is that they end up
being the sole component of the key, due to the principle of necessary
irreducibility. Which is why there's typically something weird / bad
going on if a surrogate identifier is used as an element in a
multi-attribute candidate key.
Forrest L Norvell
.
- References:
- candidate keys in abstract parent relations
- From: Forrest L Norvell
- Re: candidate keys in abstract parent relations
- From: zeldorblat
- Re: candidate keys in abstract parent relations
- From: Forrest L Norvell
- Re: candidate keys in abstract parent relations
- From: Forrest L Norvell
- Re: candidate keys in abstract parent relations
- From: David Portas
- Re: candidate keys in abstract parent relations
- From: vldm10
- candidate keys in abstract parent relations
- Prev by Date: Re: candidate keys in abstract parent relations
- Next by Date: Separation of DDL and DML - was: Early and late binding.
- Previous by thread: Re: candidate keys in abstract parent relations
- Next by thread: Re: candidate keys in abstract parent relations
- Index(es):
Relevant Pages
|