Re: integer id columns for all tables



garhone schrieb:
Hi,

Someone with greater expertise than I recently told me that it is best
to have an integer primary-key for all tables in a database, even if
the table already has some non-integer primary key, or some sort of
composite primary key. If there is already a non-integer key, then
create a new sequence column and set that as the primary key.

Does anyone have an opinion on this and why?

Here are a number of principles for designing identifiers:

- An entity may well exist without properties but not without identifiers.

- An identifier is the first thing to be created before actually the entity itself is created. In other words, we create an identifier and only after that (and if it is really needed) the corresponding entity is created.

- Identifiers have to be constant (in contrast to properties). If you use a kind of system identifier then you are guaranteed then nobody can change it or otherwise misuse.

- Identifiers should not be meaningful because they are intended to be manipulated by the software rather than by users (with the exception of some well known identifiers used for bootstrapping). So identifiers should be hidden from the users.

- Identifiers should be defined before any properties exist. Such a structure will have only identifiers for accessing empty entities. After that we can add meaningful properties. This discipline can ensure the separation between these two concerns.

- If you need uniqueness then use the corresponding constraint. Frequently the need for uniqueness in a property leads to the conclusion that it should be used as an identifier.

Assume that there are two tables Companies and Products with many-to-many relationship implemented via CP table. When we create this join table, it automatically gets some identification means for its elements. Or this format for its identifiers has to be provided manually, for example, as some auto-integers (there are not properties yet). *After that* we can add some properties. In this example we want to add two fields the first pointing to a product and the second pointing to a company. (Notice again that we keep identifiers and properties separately.)

Alternatively, we might want to *optimize* this representation and combine two roles: identification and characterization. In this case we remove our identifier column and then say that the two properties <Company, Product> will identify each element. (Notice that the question is not if we need to *add* an integer identifier but rather if we can *remove* the existing identifier and use our properties instead of it.)

We hope that such records will take less space. It is not always so. The thing is that these records can be referenced by other records and then we need to store two fields instead of only one. Thus records themselves will really take less space but other records will need more space for longer identifiers.

Now assume that somebody wants to change one of the two properties. Here we need to answer the main question: are these columns identifiers or characteristics of the entity. Since we combined them, the answer is twofold. So every new developer of the system will give his own answer with the corresponding consequences.

Another problem is that it not an exception when two records with the same pair of <Company, Product> may exist. Indeed, why not? These are only properties and they can take any values. Or assume that in future this join table will also need to reference more records as its characteristics. Should I add them to the existing two-field identifiers or should I keep them separately. If I add them, then I will have to reorganize the whole model. If not, then for some unknown reasons two of three fields will be used for identification while the third will be not. It does not make sense because all the fields have equal role. Here again keeping separately properties and identifiers will prevent you from their misinterpretation.

As I said already, these arguments do not mean that complex identifiers are not needed. The problem is that using them may well result in an inconsistent behaviour due to ambiguous interpretation of properties. Using primitive identifiers is one possible safe way for solving this problem. It guarantees that you will not have serious problems with your system especially in a year or two when some other developers start extending it.

--
http://conceptoriented.com
.



Relevant Pages

  • Re: integer id columns for all tables
    ... to have an integer primary-key for all tables in a database, ... composite primary key. ... An entity may well exist without properties but not without identifiers. ... separation between these two concerns. ...
    (comp.databases.theory)
  • Re: Primary Key Best Practices Poll
    ... > I believe that when a primary key logically consists of two or more ... any time the natural key gets over about two attributes I'd ... There are better unique identifiers than Identity, ...
    (microsoft.public.sqlserver)