Re: what are keys and surrogates?
- From: Bob Badour <bbadour@xxxxxxxxxxxxxxxx>
- Date: Sun, 27 Jan 2008 22:01:27 -0400
JOG wrote:
On Jan 27, 7:39 pm, rp...@xxxxxxxxxxxxxxxxxxxxxx (rpost) wrote:
(Sorry to be a bit late with my reply.)
no prob. there a million and one things more important than arguing
about dangers of tuple id's! Anyhow I think we've done pretty well to
get this far disagreeing so much and still being civil ;) Go us.
JOG wrote:
[...]
We have some bits of paper with numbers written on (in pencil). We are
storing info about these bits of paper in a database using the schema:
{paperID, Value}. The key, PaperID, is a unique database generated
hidden surrogate.
We have an enumeration:
{ (paperID:1, Value:X), (paperID:2, Value:Y), (paperID:3, Value:Z) }
Fine. Now why does this ID exist? What useful purpose can it possibly
serve? The purpose I see, and that I've been defending here, is that
relations can now use this ID attribute when they want to refer to table
rows, instead of using its "real" attributes (just Value in this case),
and that in doing so, the database designer introduces entities, or
objects if you wish, into the database: it separates changes to paper
(i.e. to an attribute value in the paper relation) and additions/deletions
of papers (i.e. to the paper relation) from changes to the *use* of paper
(how paper is used in other relations).
Yeah, I fully understand your position, its where I once stood. I
think you see the mistake (imho of course) a lot when people move from
OOP to databases, because it's such a shift facing that RM doesn't
have any addresses in it.
So we expect this paperID to occur at least once as an attribute
in another relation.
Someone comes to you the DB admin, with 3 bits of paper and says, ok
the boss has changed the values on some of the bits of paper. What I
have here is one bit of paper with an A on, one with a B and and one
with a Z. Please update the database accordingly.
That is impossible. We need to know how these bits of paper relate
to the existing bits. Can we go by our gut feeling and assume
that the Z bit stays the same while the X bit is relabeled A
and the Y bit relabeled B? Or are the X and Y bits discarded entirely,
and replaced with two new A and B bits? Or is perhaps the X bit
relabeled to B, the Y bit to A, the Z bit discarded, and a new bit
introduced with the label Z? The boss will have to tell us.
The boss cannot tell you, that was the whole point.
Please tell me what you do to deal with this. No redefinitions, or
tangents, just tell me how you ensure that the right PaperID's get the
right updates? How do you known which paperID has changed, and to
which value? How do you even know you have been presented with the
same bits of paper as the initial three, and not new ones altogether?
The DBA would be at a complete loss.
Exactly, but this is not because of the presence of paperIDs;
it's because the boss did not provide that information.
As soon as the boss provides this information, we can think of how
we implement the required updates. In the presence of paperID,
it's easy, for instance,
As I said, the boss can't remember what he did. Just take it as given
that you cannot get the information, and have to deal with what you
see in this example. An update is impossible, and our db is broken.
UPDATE Paper SET Value = "A" WHERE Value = "X";
UPDATE Paper SET Value = "B" WHERE Value = "Y";
Without paperID, what would we have instead?
Nothing at all, if we were happy with just 'value'. We'd also have to
accept that we weren't tracking bits of paper then. In fact the notion
of an individual piece of paper, that existed independently of what
was written on it, wouldn't exist in our model at all. Funny things
entity types.
It took me a long time to grok how this even made any sense, to see
that an entity type is actually _defined_ by what we say consistently
identifies it over different states. I then realised that the entity
type I have in the real world (an individual bit of paper) might not
be the same type I've constructed in the model (a value written on
paper). I didn't need OID's to fix my problems...I just needed to make
sure my entities tallied with my constructs! And that meant I just
needed to use the right identifiers.
Release the power of the grok too reinier! :-)
The Paper table would
not be present at all, and all my foreign key attributes to the
Paper.paperID attribute would just be plain Paper values. So what
would the update look like?
Well, you'd have to know, somehow, which
of all the possible attributes in your tables stand for bits of paper,
and write the same update queries for each of those attributes,
that I wrote just for my Paper table. Not so easy. Why not?
Because this Paper-less schema provides less information about paper
than the one you proposed: it no longer tells us explicitly which
attributes in the database stand for bits of paper.
But wait, you may object, we may be using "abstract domains"
in our database; instead of directly using this enum as value type,
we may have an explicitly declared a Paper value domain that is defined
to be this enum. Some DBMSs promote this approach, e.g. Firebird.
But that only tells us that we should rather be looking at entities
that are a bit more complicated than enums; e.g. we can study what happens
with entities that have ID-based foreign keys *themselves*; e.g. a Paper
table may refer to types of paper, and one of its attributes may be
a reference to its manufacturer.
But wait, you may try again, perhaps the boss has something more
complicated in mind: e.g., the X paper may be *split* into A and B,
with some of its uses being replaced with the use of A paper, and
the other uses with B paper. So your nifty paperID doesn't always
buy you simplicity of updates, nyaah nyaah. Well, I can only agree.
If you do follow this example, then perhaps you could explain why you
are having trouble generalizing it to any schema where the key
attribute is hidden (i.e. non-visible to the business), and all other
attributes are mutable.
The "trouble", as I tried to explain before, arises when a relation exists
of which none of the keys can be decomposed (by following foreign key
references) in such a way that only non-ID attributes remain.
The "trouble" is that you are probably stuck in thinking in OO, like I
once was when I thought C++ was the shizzle to end all shizzles.
Unfortunately to guarantee you can recognize an 'entity' when it
arrives again at your database, you need that stable identifier. Each
"bit of paper" has already has a natural identifier of course, but we
can't use it (it's x,y path), and this is the root of our problem. And
yet all the issues can be solved one single swoop by just...
*Writing the paper_id in permanent marker on the back of each bit of
paper*
Now that the surrogate key isn't hidden, all problems vanish. It
represents the papers x,y path and every bit of paper is simply
identifiable, updates are trivial, and importantly the attribute is no
longer some magical OID or yucky tuple identifier fudge.
But wait... now, it's familiar! ... we have (re)invented the natural key! Woo! Hoo!
.
- References:
- Re: what are keys and surrogates?
- From: Marshall
- Re: what are keys and surrogates?
- From: rpost
- Re: what are keys and surrogates?
- From: JOG
- Re: what are keys and surrogates?
- From: rpost
- Re: what are keys and surrogates?
- From: JOG
- Re: what are keys and surrogates?
- Prev by Date: Re: what are keys and surrogates?
- Next by Date: Re: what are keys and surrogates?
- Previous by thread: Re: what are keys and surrogates?
- Next by thread: Re: What is an automorphism of a database instance?
- Index(es):
Relevant Pages
|