Hierarchy as 'UP' constraint
[[[Hierarchy as 'UP' constraint]]]
Many articles (in c.d.t. and elsewhere) and even a
book (Joe Celko's “Trees and hierarchies in SQL for
smarties“) have been written about hierarchies and
relational databases, mostly on how to implement
hierarchies in tables. Let's not immediately do that
and still have the desired effect - which is to manage
data less clumsily when there are hierarchies involved.
Let's first deal with meaning and usage.
A big selling point of the relational model is to
separate the model from the implementation – every
minute spent less on the how gives us more time to
spend on the what – in yet other words: separate
the logical from the physical. When we deal with
hierarchies the distinction seems blurred – especially
when both the specification and the implementation are
in SQL.
While trying to keep the distinction I'll describe an
example that intuitively makes sense – Ill make some
choices which may or may not turn out to be arbitrary.
I'll start with expanding on just one of several adjacency
list (note 1) possibilities mentioned in J.C.'s book here in a
slightly different approach:
Separate the data and the hierarchy, see the hierarchy
strictly as a table constraint.
[[Example]]
Example in imaginary syntax (note 2)
CREATE TABLE TREENODES (
NR NUMBER PRIMARY KEY,
NAME VARCHAR2(50),
PNR up);
Expansion
'up' in the example is (for now – it will be simplified later)
short for these rules:
(0) (note 3). PNR is of the same type as the primary key of the
same table (for simplicity: assume it must be single column):
in this case NUMBER
(1). PNR has a special foreign key-like constraint, referencing
the primary key of the same table. So it is not very foreign,
it is more like domestic: we could call a self referencing
(same table, different column, not necessarily same row) FK a
domestic key constraint, somehow implementing rule (0) and (1)).
(2). When a row X is deleted, all rows Y referencing the deleted
row X are updated to copy the deleted rows X.PNR (instead of X.NR).
(3). No cycles.
What should the PNR of the root be - is this an arbitrary choice?
[[Semantics]]
The data-trees I know depict organization and/or containment.
What should the PNR of the root be? Part one.
Let's see:
It can be the same as NR, it can be NULL, it can be an unique
value denoting it as root.
These alternative representations of root have different
interpretations.
PNR = NR says:
'I am my boss', (organizational) or , 'I am self-contained'
– looking at the rows as containers: things containing other
things – nested set-like.
semantics++.
PNR = NULL says:
'I don't know who my boss is, don't even know
if I have one', or - my boss, existing or not,
is absent. 'Am I contained? '
semantics--.
PNR = unique_value(root) says: 'I am the boss', or
'I contain everything else'.
semantics++, but slightly different from PNR = NR,
more specific.
I wonder if there are more frequently occurring interpretations
for data-trees, beside organization and containment.
[[Pragmatics]]
What should the PNR of the root be? Part two.
PNR = NR
What to do when a row X having X.PNR = X.NR gets
deleted while there are other rows Y having Y.PNR = X.NR?
Interpreted: “What to do when 'I am my boss' X having
subordinates Y gets fired”,
or “when an 'I am self-contained' container containing
other thing gets thrown away”.
'up' - rule (2) would make Y violate the 'domestic key contra int.
Luckily we can still amend rule (2) to read:
(2). When a row X is deleted, all rows Y that referenced the
deleted row X are updated to copy the deleted rows X.PNR
(instead of its NR: Y.PNR := X.PNR) unless the deleted row
had X.NR = X.PNR, in that case Y.PNR := Y.NR.
That makes sense both in the organizational interpretation
and in the container interpretation:
When a boss row is deleted, the immediate subordinates become
their own boss. Professor Celko calls this “send the orphans
to grandmother”. When a container is deleted, it's contents
becomes visible: unwrap the package.
When you want to delete the container and destroy its contents
you'll have to be explicit about it.
A consequence is that the table can hold many unrelated hierarchies
– unless there is a rule (4) (see below).
There seems to be another problem. Rule (3) says: No Cycles.
PNR = NR surely looks like one.
So:
(3). No Cycles. That is, no cycles over more than one row.
A cycle over one row (PNR = NR) has a specific, fixed
interpretation: this row is (a) root.
Yet another consequence is that PNR = NULL is free to mean
the same as in a non-hierarchical context – whatever that
may be (as you can tell I never fully understood NULL).
PNR = NULL
On Rule (2) PNR = NULL works out the same as PNR = NR.
It doesn't need any qualification in the 'No cycles' rule (3).
PNR = special
PNR = special_value(root) suggests a rule (4): 'There can
be only one' (rumbling noise in background) or 'ONE ROOT'.
Then again, such a constraint could be implemented along
with the first two root representations just as hard/easily.
PNR = NR
No need for special values, no need for NULLs – the only
drawback is the necessary qualification of the “No cycles” rule.
Let's proceed with the alternative where PNR = NR says:
'I am my boss'. The rules to interpret 'up' in
CREATE TABLE TREENODES (
NR NUMBER PRIMARY KEY,
NAME VARCHAR2(50),
PNR up);
are now:
'up' is short for:
(1) (note 4). PNR has a 'domestic' key constraint (note 5).
(2). When a row X is deleted, all rows Y (subordinate or contained)
that referenced the deleted row X are updated to copy the deleted
rows X.PNR (instead of what is was before the delete, its X.NR).
We can think of this as 'ON DELETE PROMOTE' or 'ON DELETE EXPOSE'
(3). No cycles over more than one row.
Obviously the 'ON DELETE EXPOSE' and 'ONE ROOT' constraint qualifiers
combined make it impossible to delete the root-row if it has more
than one subordinate row.
[[Implementation]]
It looks very much like TREENODES with PNR = NR for self-contained
containers is implementation: it suggests an adjacency list, and
without anything else, it /is/.
With the 'up' constraint however we can treat the whole as an
implementation neutral specification – logical instead of physical.
I don't want to hold my breath waiting for an SQL standard with
'up' in the sense just discussed. We'll have to check whether
we can also generate other than adjacency list implementations
from this. The imaginary syntax should have room for a choice
of implementation strategy of course.
[[notes]]
1. Giving away the pointe: Whether it really is, is an
implementation issue: 'up' may very well be implemented
with the 'nested intervals' approach. One could imagine
this being steered by syntactical additions in an
'up' constraint sub-clause. However, the adjacency list
is what will be visible anyway.
2. I might have tried a Tutorial D example, but I'm not
familiar enough with Tutorial D yet. I don't think the
example suffers from the weaknesses of SQL to which
Tutorial D has an answer. So for now: SQL.
3. It will become clear why I start with (0) instead
of (1) later on.
4. See? See note 3
5. Domestic key: A self referencing 'foreign' key:
same table, different column, possibly, but not necessarily
same row. Not really necessary, foreign keys will work just
fine – 'foreign' just doesn't sound right.
.
Relevant Pages
- Re: how would I store the XML
... Trees and hierarchies in SQL are fairly complex when it comes to ... So, unless you are using SQL Server 2005, I recommend storing the Xml ... Trees in SQL: ... (microsoft.public.dotnet.xml) - Re: Serious errors with Create view command
... subdepts, groups of subdepts make depts anmd groups of depts make Divisions ... division you need to supply the number of ALL the hierarchies above it as ... Using VFP9 your CREATE VIEW displays like this in the View Designer View SQL ... inventory.subdeptno, inventory.classno, inventory.istyl FROM inventory ... (microsoft.public.fox.helpwanted) - Re: Trying to implent (Joe Celkos) Nested Sets, but need more data columns
... Nested sets are just one of many ways to model such ... >things in SQL. ... If anyone using nonrelational features needs help, ... It's possible to model hierarchies to obtain all the benefits of your ... (microsoft.public.sqlserver.programming) - [PATCH 0/6] Multi-hierarchy Process Containers
... support for multiple hierarchies of containers (up to a limit ... The mount options passed when mounting a container filesystem ... Default is to try to mount all subsystems ... process grouping mechanism that is mature, tested, and well documented ... (Linux-Kernel) - [PATCH 8/9] Containers (V9): Share css_group arrays between tasks with same container memberships
... hierarchies will share a css_group object. ... Assuming that many tasks share the same container assignments, ... struct containerfs_root; ... struct task_struct *task, int subsys_id) ... (Linux-Kernel) |
|