Re: 2NF There are two Definitions which is right



I've included most of Jens original question and rather more of my initial response to Jens question in this response, so that the discussion is more nearly complete in one message.

Jens Haase wrote:
Jonathan Leffler wrote:
Jens Haase wrote:
I found two definitions for 2NF:

1: A relation R(A,F) is in 2NF, when every attribute not
belonging to the primary key of R is fully functionally dependent
on the primary key of R

Can you give a hint as to the significance of the parenthesized "(A,F)"; neither A nor F is referenced in the definition quoted. I don't think it alters the answer, but I'm curious. (The comma after
2NF should be omitted too.)

A beter version of this definition would be "A relation R is in 2NF when every attribute that is not part of a candidate key of R is fully functionally dependent on the candidate key(s) of R".


R(A,F) means a Relation R with attributes A and functional
dependencies F

Fine - as expected, this makes no difference to the answer.

2: A relation schema R is in 2NF if every nonprime attribute A in
R is fully functionally dependent on the primary key of R

Assuming 'nonprime attributes' are the attributes that do not belong to the primary key, it seems to me that the two definitions are the same.

This definition (the second) is from Elmasri/Navate Fundamentals of Database Systems Second Edition Page 411.
On Page 408 in the same book the authors state:


An attribute of relation schema R is called a prime attribute of R if
it is a member of some candidate key of R. An attribute is called
nonprime if it is not a prime attribute—that is, if it is not a
member of any candidate key.

It certainly helps to know the extra background. As I noted in a separate response to Jon Heggland's comment, neither of the definitions clearly allows for multiple candidate keys, though given this definition of non-prime attribute, you can argue that this version does. However, even so, it would be IMNSHO be better phrased if it finished 'dependent on each candidate key of R'.


A superkey [...]

Superkeys are not relevant to most of this discussion, I think.

If a relation schema has more than one key, each is called a
candidate key. One of the candidate keys is arbitrarily designated to
be the primary key, and the others are called secondary keys

Secondary key is not a common term for the non-primary candidate keys (though it is more logical in some respects than 'alternative key', which is the term I'm more familiar with). These days there is no formal reason to distinguish one candidate key as the primary key and the others by another name. Normalization theory is best defined in terms of candidate keys (see Date 'The Primacy of Primary Keys: An Investigation' in 'Relational Database: Writings 1991-1994').


According to this definition there is a difference between the two definitions.

Just about - and the second is the better definition, but not perfect.

According to the first definition the following schema would not
be in 2NF, according to the second it would be:
>>>
>>> R = (A,B,C,D)
>>>
>>> with:
>>> A->D
>>> AB->C
>>> C->D
>>> D->A
>>>
>>> According to the first definition when the primary key is AB, then
>>> A->D violates 2NF, because D is not part of the primary key.
>>
I agree that AB is the primary key - or at least a valid primary key.

AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD (union); and AB->ABCD (reflexivity); and therefore AB determines all attributes of R, which makes it a candidate key.


A->D itself cannot violate 2NF; it is a functional dependency, and relation schemas are what satisfy or violate 2NF. And the reasoning the that D is not part of the primary key is bogus too.

Clearly, 'the that' is a typo - sorry. The first sentence is logic chopping, and may not be fair since I suspect English is not Jens's first language; nevertheless, it is also accurate. "According ... AB, then the FD A->D means that R is not in 2NF" would be concise - but wrong. However, the trouble is that the first definition precludes the possibility of multiple candidate keys, and therefore cannot be applied to this relation schema where, as shown below, there are 3 candidate keys. If the first definition is adapted to fix the problem, it becomes equivalent to the second, and the relation R is in 2NF.


I don't think my second sentence makes as much sense as I'd like it to.
However, the reason that R is not in 2NF is not because of the status of D but because of the status of A -- A on its own is not a candidate key, and the fact that A alone determines D is what prevents R from being in 2NF.


A->D tells us that D is only dependant on A and so only on a part of the primary key.

Yes. (And this is a change in stance from yesterday's post.)

As I see it, AB->C and C->D implies that AB->D so the primary key determines both C and D, as required. Granted, it is neither in 3NF nor BCNF - the mnemonic is "the key, the whole key, and nothing but the key", and the functional dependency A->D means that 'the whole key' part of the rule is violated. (The C->D part is also problematic, but doesn't cause the relation schema to violate 2NF.)

By implication, yesterday I said "but R is in 2NF"; today, I think that was wrong.


According to Ullmans definition of 3NF:
A relation R is in third normal form in whenever X -> A holds in R
and A is not in X, then either X is a candidate key for R, or A is
[a] prime attribute

it is in 3NF since:

A->D holds 3NF since D is prime attribute
AB->C holds 3NF because AB is candidate key
C->D holds 3NF because D is prime attribute
D->A holds 3NF because A is prime attribute

Yes. To understand this, we need to establish that AB, BC and CD are candidate keys - which is discussed below. One of the strengths of BCNF over 3NF is its greater simplicity.


Date summarizes this nicely (An Introduction to Database Systems, 7th Edn, p375):

3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of the attributes of R, and let A be any single attribute of R. Then R is in 3NF if and only if, for every FD X->A in R, at least one of the following is true:
1. X contains A (so the FD is trivial);
2. X is a superkey;
3. A is contained in a candidate key of R.


BCNF is obtained from this definition of 3NF by dropping condition 3.
(And I note that superkeys make an appearance here.)

This does not depend on the definition of 2NF; some other characterizations of 3NF have an "if the relvar [relation] is in 2NF and ...." structure.

Since every single attribute on the RHS of an FD in your set satisfies requirement 3, R must be in 3NF. It is surely not in BCNF, though.

[
Somewhat tangentially - at p400, Date gives the following neat characterization of BCNF, 4NF and 5NF:


BCNF: A relvar is in BCNF if and only if every FD  satisfied by R
      is implied by the candidate keys of R.

4NF:  A relvar is in 4NF  if and only if every MVD satisfied by R
      is implied by the candidate keys of R.

5NF:  A relvar is in 5NF  if and only if every JD  satisfied by R
      is implied by the candidate keys of R.
]

According to the second definition R is in 2NF because the
candidate keys are AB, BC, BD and since that, every attribute is
prime attribute, so there is no violation of 2NF.

Now you introduce a term - candidate key - not mentioned in the definitions of 2NF that we're dissecting. I would dispute that BC
is a candidate key anyway; likewise BD. Given a relation schema R {
A, B, C, D, E } with candidate key AB, by definition, AB->CDE (and,
by the rules of trivial functional dependencies (or FDs),
AB->ABCDE). That is, a candidate key functionally determines every
attribute in the relation schema. How do you apply Armstrong's
Axioms to the relation schema and list of FDs to deduce that
BC->AD?
>
Well since C->D and D->A with Armstrongs 3. Axiom we get C->A that
gives us C->DA and with Armstrongs 2. Axiom we get BC->BDA, so BC is
a candidate key.
>
since D->A with Armstrongs 2. Axiom we get BD->AB and with AB->C with
Armstrongs 3. Axiom we get BD->C so we get BD->ABC, so BD also is a candidate key.

OK; I sit corrected - twice.

So, all the attributes are prime attributes, which means that the relation is in 2NF by either definition (or, at least, by either definition as amended) because both definitions only refer to non-prime attributes and your relation schema has no non-prime attributes.

The [w]hole problem i am facing, is that the first definition is from
a german script and the second is from an english book
(Elmasri/Navathe) and my thoug[h]t is, that there is a confusion
between "Primary key" and "prime".

I still think the difference between the two definitions is mainly in that the first implicitly assumes that there is just one candidate key and the second implicitly permits there to be multiple candidate keys, and therefore the second definition is more general and better.


--
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@xxxxxxxxxxxxx, jleffler@xxxxxxxxxx
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/
.



Relevant Pages

  • Re: 2NF There are two Definitions which is right
    ... An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. An attribute is called nonprime if it is not a prime attribute—that is, if it is not a member of any candidate key. ... A superkey of a relation schema is a set of attributes S part of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1= t2. ... One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys ... Granted, it is neither in 3NF nor BCNF - the mnemonic is "the key, the whole key, and nothing but the key", and the functional dependency A->D means that 'the whole key' part of the rule is violated. ...
    (comp.databases.theory)
  • Re: A pk is *both* a physical and a logical object.
    ... candidate key) nor even the concept of "primary key". ... NOT intend the inference that in order to advocate the selection and use ... a primary key, one had to be of the SQL school of data management. ... reason that may or may not be grounded in relational theory. ...
    (comp.databases.theory)
  • Re: 2NF There are two Definitions which is right - amended
    ... belonging to the primary key of R is fully functionally dependent on the primary key of R ... every attribute that is not part of a candidate key of R is fully ... to this relation schema where, as shown below, there are 3 candidate ...
    (comp.databases.theory)
  • Re: Base Normal Form
    ... >>>There is a function for every candidate key. ... primary key, whether or not that is an implementation concept in the ... >> It can be modeled as many relations as well since a database relation ... >> has unordered columns and a mathematical relation has them ordered. ...
    (comp.databases.theory)
  • Re: What are the differences between the terms, CANDIDATE KEY, PRIMARY KEY, SUPER KEY, COMPOSITE KEY
    ... candidate key thus can contain a single attribute or a collection of ... uniquely and this attribute is called the primary key. ... There is no formal distinction between them. ... from a set of attributes to the whole relation header. ...
    (comp.databases.theory)